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. In this paper, we focus on the scheduling problem in multi-channel wireless networks, e.g., the 

■ downlink of a single cell in fourth generation (4G) OFDM-based cellular networks. Our goal is to design 
efficient scheduling policies that can achieve provably good performance in terms of both throughput 
and delay, at a low complexity. While a recently developed scheduling policy, called Delay Weighted 

Q . Matching (DWM), has been shown to be both rate-function delay-optimal (in the many-channel many-user 

asymptotic regime) and throughput-optimal (in general non-asymptotic setting), it has a high complexity 

^ ■ 0{ti^), which makes it impractical for modern OFDM systems. To address this issue, we first develop a 

00 ! 

^.f-^ I simple greedy policy called Delay-based Queue-Side-Greedy (D-QSG) with a lower complexity 0{n^), 

and rigorously prove that D-QSG not only achieves throughput optimality, but also guarantees near- 

■ optimal rate-function-based delay performance. Specifically, the rate-function attained by D-QSG for 
any fixed integer threshold 6 > 0, is no smaller than the maximum achievable rate-function by any 
scheduling policy for threshold 6—1. Further, we develop another simple greedy policy called Delay- 
based Server-Side-Greedy (D-SSG) with an even lower complexity 0(n-^), and show that D-SSG achieves 
the same performance as D-QSG. Thus, we are able to achieve a dramatic reduction in complexity (from 
O(n^) of DWM to 0{n^)) with a minimal drop in the delay performance. Finally, we conduct numerical 
simulations to validate our theoretical results in various scenarios. The simulation results show that 
our proposed greedy policies not only guarantee a near-optimal rate-function, but also empirically are 
virtually indistinguishable from the delay-optimal policy DWM. 
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I. Introduction 

In this paper, we consider the scheduUng problem in a multi-channel wireless network, where the 
system has a large bandwidth that can be divided into multiple orthogonal sub-bands (or channels). A 
practically important example of such a multi-channel network is the downlink of a single cell of a fourth 
generation (4G) OFDM-based wireless cellular system (e.g., LTE and WiMax). In such a multi-channel 
system, a key challenge is how to design efficient scheduling policies that can simultaneously achieve 
high throughput and low delay? This problem becomes extremely critical in OFDM systems that are 
expected to meet the dramatically increasing demands from multimedia applications with more stringent 
Quality-of-Service (QoS) requirements (e.g., voice and video applications), and thus look for new ways 
to achieve higher data rates, lower latencies, and a much better user experience. Yet, an even bigger 
challenge is how to design such high-performance scheduling policies at a low complexity? For example, 
in OFDM systems, the Transmission Time Interval (TTI), within which the scheduling decisions need 
to be made, is typically on the order of a few milliseconds. On the other hand, there are hundreds of 
orthogonal channels that need to be allocated to hundreds of users. Hence, the scheduling decision has 
to be made within a very short scheduling cycle. 

We consider a single-cell multi-channel system consisting of n channels and a proportionally large 
number of users, with intermittent connectivity between each user and each channel. We assume that the 
Base Station (BS) maintains a separate First-in First-out (FIFO) queue associated with each user, which 
buffers the packets for the user to download. The delay performance that we focus on in this paper 
is the probability that the largest packet waiting time (or delay) in the system exceeds a certain fixed 
threshold. Such a probability can be estimated by its asymptotic decay-rate (or called rate-function in 
large-deviations theory) when n becomes large. We refer to this setting as the many-channel many-user 
asymptotic regime. 

A number of recent works have considered a multi-channel system similar to ours, but looked at 
delay from different perspectives. A line of works focused on queue-length-based metrics: average queue 
length [H or queue-length rate-function in the many-channel many-user asymptotic regime lH-lUl. In 
ifTl . the authors focused on minimizing cost functions over a finite horizon, which includes minimizing 
the expected total queue length as a special case. The authors showed that their goal can be achieved in 
two special scenarios: 1) a simple two-user system, and 2) systems where fractional server allocation is 
allowed. In lH-lSl, delay performance is evaluated by the queue-overflow probability, and its associated 
rate-function, i.e., the asymptotic decay -rate of the probability that the largest queue length in the system 
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exceeds a fixed threshold. Although lH and HI proposed scheduling policies that can guarantee both 
throughput optimality and rate-function optimality, they suffer from the following shortcomings. First, 
although the decay-rate of the queue-overflow probability may be mapped to that of the delay-violation 
probability when the arrival process is deterministic with a constant rate IS, this is not true in general, 
especially when the arrivals are correlated over time. Further, Q and [81 have shown through simulations 
that good queue-length performance does not necessarily imply good delay performance. Second, their 
results on rate-function optimality strongly rely on the assumptions that the arrival process is i.i.d. not 
only across users, but also in time, and that per-user arrival at any time is no greater than the largest 
channel rate. Third, even under this more restricted model, their proposed algorithms with rate-function 
optimality are of complexity at least 0{n^). For more general models, no algorithms with provable 
rate-function optimality are provided. 

Similar to this paper, another line of work Q directly focused on the delay performance rather than the 
queue-length performance. The performance of delay is often harder to characterize, because the delay in 
a queueing system often does not admit a Markovian representation, even for simple M/M/1 queues. The 
problem becomes even harder in a multi-user system with fading channels and interference constraints, 
since the service rate for individual queues becomes more unpredictable. In Q, the authors developed a 
scheduling policy called Delay Weighted Matching (DWM), which maximizes the sum of the delay of the 
packets scheduled in each time-slot. It has been shown that DWM is not only throughput-optimal, but also 
rate-function delay-optimal in many cases (i.e., maximizing delay rate-function, rather than queue-length 
rate-function as considered in |[2l-|l5l.) However, DWM incurs a high complexity 0{n^), which renders 
it impractical for modern OFDM systems with many channels and users (e.g., on the order of hundreds). 
Hence, scheduling policies with a lower complexity are preferred in such multi-channel systems. 

This leads to the following natural but important questions: Can we find scheduling policies that 
have a significantly lower complexity, with comparable or only slightly worse performance? How much 
complexity can we reduce, and how much performance do we need to sacrifice ? In this paper, we answer 
these questions positively. Specifically, we develop low-complexity greedy policies that achieve both 
throughput optimality and rate-function near-optimality. 

We summarize our main contributions as follows. 

First, we propose a greedy scheduling policy, called Delay-based Queue -Side -Greedy (D-QSG), which 
has a lower complexity O(n^) compared to O(n^) of DWM. D-QSG, in an iterative manner, schedules the 
oldest packets remaining in the system one-by-one whenever possible. We rigorously prove that D-QSG 
not only achieves throughput optimality, but also guarantees a near-optimal rate-function. Specifically, 
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the rate-function attained by D-QSG for any fixed integer threshold 6 > 0, is not only positive but also 
no smaller than the maximum achievable rate-function by any scheduling policy for threshold 5—1. We 
obtain this result by comparing D-QSG with a new Greedy Frame-Based Scheduling (G-FBS) policy that 
can exploit a key property of D-QSG. We show that G-FBS policy guarantees a near-optimal rate-function, 
and that D-QSG dominates G-FBS in every sample-path. 

Second, we propose another greedy scheduling policy, called Delay-based Server-Side-Greedy (D- 
SSG), which has an even lower complexity 0{n?'). D-SSG, also in an iterative manner, allocates servers 
one-by-one to serve a connected queue that has the largest head-of-Une (HOL) delay. Note that the 
queue-length-based counterpart of D-SSG, called Q-SSG, has been studied in ||3l, H. There, however, 
the authors were only able to prove a positive (queue-length) rate-function for restricted arrival processes 
that are i.i.d. not only across users, but also in time. On the contrary, we show that D-SSG achieves 
the same performance as D-QSG, by proving that D-SSG and D-QSG are sample -path equivalent under 
certain tie-breaking rules. Thus, we are able to achieve a dramatic reduction in complexity (from O(n^) 
of DWM to 0(n^)) with a minimal drop in the delay performance. 

Finally, we conduct numerical simulations to validate our theoretical results in various scenarios. 
Our simulation results show that our proposed greedy policies not only guarantee a near-optimal rate- 
function, but also empirically are virtually indistinguishable from the delay-optimal policy DWM. Further, 
the simulation results also show that D-SSG consistently outperforms its queue-length-based counterpart 
Q-SSG in all scenarios that we consider 

The remainder of the paper is organized as follows. In Section |lll we describe the details of our system 
model and performance metrics. In Section JIIJ we derive an upper bound on the rate-function that can be 
achieved by any scheduling policy. Then, in S ections HVl and IVl we present our main results on throughput 
optimality and near-optimal rate-function for our proposed low-complexity greedy policies. Further, we 
conduct numerical simulations in Section |Vl] Finally, we make concluding remarks in Section IVIII 

II. System Model 

We consider a discrete-time model for the downUnk of a single-cell multi-channel wireless network 
with n orthogonal channels and n users. In each time-slot, a channel can be allocated only to one user, but 
a user can be allocated with multiple channels simultaneously. As in /E1/-/E1/, E^yfor ease of presentation, 
we assume that the number of users is equal to the number of channels. Our rate-function delay analysis 
follows similarly if the number of users scales linearly with the number of channels. We let Qi denote the 
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FIFO queue associated with the i-th user, and let Sj denote the j-th server We consider the following 
i.i.d. ON-OFF channel model that has also been used in the previous works (e.g., |[I1-|I11> Q)- In such 
a model, the connectivity between each queue and each server change between ON and OFF from time 
to time. We assume that the perfect channel state information (i.e., whether each channel is ON or OFF 
for each user in each time-slot) is known at the BS. This is a reasonable assumption in the downlink 
scenario of a single cell in a multi-channel cellular system with dedicated feedback channels. We also 
assume unit channel capacity, i.e., at most one packet from Qi can be served by Sj when the connectivity 
between Qi and Sj is ON. This assumption of unit channel capacity is made for ease of exposition, and 
our analysis can be readily extended to a 0-K channel model (where the channel capacity is K packets 
per time-slot when a channel is ON). Let Cij{t) denote the connectivity between queue Qi and server Sj 
in time-slot t. Then, Cij{t) can be modeled as a BemoulU random variable with a parameter q G (0, 1), 
i.e., 

1, with probability q, 
0, with probability I — q. 
We assume that all the random variables Cij (t) are i.i.d. across all the variables i, j and t. Such a network 
can be modeled as a multi-queue multi-server system with stochastic connectivity, as shown in Fig. [T] 

As in the previous works |[ll-||3l^ iH, the above i.i.d. ON-OFF channel model is a simpUfication, and 
is assumed only for the analytical results. The ON-OFF model is a good approximation when the BS 
transmits at a fixed achievable rate if the SINR level is above a certain threshold at the receiver, and does 
not transmit otherwise. The sub-bands being i.i.d. is a reasonable assumption when the channel width is 
larger than the coherence bandwidth of the environment. Moreover, we believe that our results obtained 
for this simple channel model can provide useful insights for more general models. Indeed, we will show 
through simulations that our proposed greedy policies also perform well in more general models, e.g., 
accounting for heterogeneous (near- and far-)users and time-correlated channels. Further, we will briefly 
discuss how to design efficient scheduling policies in general scenarios towards the end of this paper. 

We present more notations used in this paper as follows. Let Ai{t) denote the number of packet arrivals 
to queue Qi in time-slot t. Let A{t) = Yli=i Ai{t) denote the cumulative arrivals to the entire system in 
time-slot t, and let A{ti,t2) = X]r=ti ^i'^) denote the cumulative arrivals to the system from time ti to 
t2. We let Aj denote the mean arrival rate to queue Qi, and let A = [Ai, A2, • • • , A„] denote the arrival 

'Throughout this paper, we use the terms "user" and "queue" interchangeably, and use the terms "channel" and "server" 
interchangeably. 
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Fig. 1. System model. The connectivity between each pair of queue Qi and server Sj is "ON" (denoted by a solid line) with 
probability q, and "OFF" (denoted by a dashed line) otherwise. 



rate vector. We assume that packets arrive at the beginning of a time-slot, and depart at the end of a 
time-slot. We use Qi{t) to denote the length of queue Qi at the beginning of time-slot t immediately 
after packet arrivals. Queues are assumed to have an infinite buffer capacity. Let Zi^i{t) denote the delay 
of the /-th packet at queue Qi at the beginning of time-slot t, which is measured since the time when 
the packet arrived to queue Qi until the beginning of time-slot t. Note that at the end of each time-slot, 
the packets that are still present in the system will have their delays increased by one due to the elapsed 
time. Further, let Wi{t) = ^t,i(t) denote the HOL delay of queue Qi at the beginning of time-slot t. 
Finally, we define (rr)"*" = max(x,0), and use to denote the indicator function. 

We now state the assumptions on the arrival processes. The throughput analysis is carried out under 
the following mild assumption, which has also been used in [91 ■ 

Assumption 1: For each user i E {1, 2, . . . , n}, the arrival process Ai{t) is an irreducible and positive 
recurrent Markov chain with countable state space, and satisfies the Strong Law of Large Numbers: That 
is, with probability one, 

t— >-oo t 

We also assume that the arrival processes are mutually independent across users (which can be relaxed 
for throughput analysis as discussed in |[9l.) 

The following two assumptions are also used in the previous work [71 on the rate-function delay 
analysis. 

Assumption 2: There exists a finite L such that Ai{t) < L for any i and t, i.e., instantaneous arrivals 
are bounded. 

Assumption 3: The arrival processes are i.i.d. across users, and Xi = p for any user i. Given any e > 
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and 5 > 0, there exists T > 0, > 0, and a positive function /^(e, independent of 7i and t such that 

for all t > T and n> N. 

Assumption |2] requires that the arrivals in each time-slot have bounded support, which is indeed true 
for real systems. Assumption [3] is also very general, and can be viewed as a result of the statistical 
multiplexing effect of a large number of sources. Assumption [3] holds for i.i.d. arrivals and arrivals 
driven by two-state Markov chains (that can be correlated over time) as two special cases. 

A. Performance Objectives 

In this paper, we consider two performance metrics: 1) the throughput and 2) the rate-function of the 
probability that the largest packet delay in the system exceeds a certain fixed threshold in the many- 
channel many-user asymptotic regime. 

We first define the optimal throughput region (or stability region) of the system for any fixed integer 
n > under Assumption [T] As in a stochastic queueing network is said to be stable if it can be 
described as a discrete-time countable Markov chain and the Markov chain is stable in the following sense: 
The set of positive recurrent states is nonempty, and it contains a finite subset such that with probability 
one, this subset is reached within finite time from any initial state. When all the states communicate, 
stability is equivalent to the Markov chain being positive recurrent |[TOl . The throughput region of a 
scheduling policy is defined as the set of arrival rate vectors for which the network remains stable under 
this policy. Then, the optimal throughput region is defined as the union of the throughput regions of all 
possible scheduling policies, which is denoted by A*. A scheduling policy is throughput-optimal, if it can 
stabilize any arrival rate vector strictly inside A*. For more discussions on the the optimal throughput 
region A* in our multi-channel systems, please refer to Appendix |Al 

Next, we consider the probability that the largest packet delay in the system exceeds a certain fixed 
threshold, and its rate-function in the many-channel many-user asymptotic regime. Let W{t) denote the 
largest HOL delay over all the queues (i.e., the largest packet delay in the system) at the beginning of 
time-slot t, i.e., W{t) = maxi<j<„ Wi{t). Assuming that the system is stationary and ergodic, we define 
rate-function I{h) as the asymptotic decay-rate of the probability that the largest packet delay exceeds 
any fixed integer threshold 6 > 0, as the system size n goes to infinity, i.e., 

I{h) = lim — logP(l^(0) > b). (2) 
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Note that once we know this rate-function, we can then estimate the delay-violation probability using 
P(VF(0) > 6) exp(— n/(6)). The estimate tends to be more accurate as n becomes larger. Clearly, 
for systems with a large n, a larger value of the rate-function implies a better delay performance, i.e., 
a smaller probability that the largest packet delay in the system exceeds a certain threshold. We define 
the optimal rate-function as the maximum achievable rate-function over all possible scheduling policies, 
which is denoted by I*{b). A scheduling policy is rate-function delay-optimal if it achieves the optimal 
rate-function I*{b) for any fixed integer threshold 6 > 0. 

III. An Upper Bound on The Rate-Function 

In this section, we derive an upper bound on the rate-function that can be achieved by any scheduling 
algorithm. 

Let /AG(t,x) denote the asymptotic decay-rate of the probability that in any interval of t time-slots, 
the total number of packet arrivals is greater than n{t + x), as n tends to infinity, i.e., 

lAcit.x) =liminf — logP(^(-t + l,0) >n{t + x)). 

n— >oo n 

Let Iag{x) be the infimum of lAG{t,x) over all t > 0, i.e., 

Iag{x) = milAGit,x). 

Also, we define Ix — log j^. 

Theorem 1: Given the system model described in Section |II1 for any scheduling algorithm, we have 

limsup — log P{W{0) > b) 

n— >oo ^ 

< min{(6 + l)Ix, min {/ag(& - c) + clx}} = Iu{b). 

0<c<b 

Theorem [U can be shown by considering two events that lead to {VF(0) > b}, and computing their 
probabilities and decay-rates. We provide the proof in Appendix |B] 

Remark: Theorem [T] implies that Iu{b) is an upper bound on the rate-function that can be achieved 
by any scheduling policy. Hence, even for the optimal rate-function I*{b), we must have I*{b) < Iu{b) 
for any fixed integer threshold 6 > 0. 

In Q, the authors proposed the Delay Weighted Matching (DWM) poUcy that is rate-function delay- 
optimal and achieves upper-bound Iu{b) in many cases. However, it suffers from a high complexity O(n^). 
Specifically, DWM requires computing a maximum- weight matching over a bipartite graph G[V, E] with 
\V\ = 0{n?) and \E\ = 0{-n?), which has a complexity OdyHi?! + |yplog|y|) = O(n^) in general 

im. 
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IV. Delay-based Queue-Side-Greedy (D-QSG) 

In this section, we develop a simple greedy scheduling policy called Delay-based Queue-Side-Greedy 
(D-QSG). D-QSG, in an iterative manner, schedules the oldest packets in the system one-by-one whenever 
possible. In this sense, D-QSG can be viewed as an approximation of First-Come First-Serve (FCFS) 
policy, which has been known to be delay-optimal in many systems (e.g., a single-server queue) Q. 
We will show that D-QSG not only achieves throughput optimality, but also guarantees a near-optimal 
rate-function, at a complexity 0{n^). 

A. Algorithm Description 

We start by presenting some additional notations. In the D-QSG policy, there are at most n rounds 
in each time-slot t. Let Qi{t), Z^i{t) and VKf(t) = ^fi(t) denote the length of queue Qi, the delay 
of the l-th packet of Qi, and the HOL delay of Qi after the A;-th round in time-slot t, respectively. In 
particular, we have = Qi{t), = Zi^i{t), and = Wi{t). Let Tfc(t) denote the set 

of indices of the available servers at the beginning of the A;-th round, and let ^'fc(t) denote the set of 
queues that have the largest HOL delay among all the queues that are connected to at least one server 
in Tfc(t) at the beginning of the A;-th round, i.e., ^'^(t) = {I < i < n \ Wl'~^{t) ■ Ci,(i)>o} = 

maxi<K„ Wl^'^{t) • Ci j{t)>o}}- Also, let i{k, t) be the index of the queue that is served in the 

k-th round of time-slot t, and let j{k,t) be the index of the server that serves Qnk,t) that round. We 
then specify the operations of D-QSG as follows. 
Delay-based Queue-Side-Greedy (D-QSG) policy: In each time-slot t, 

1) Initialize k = 1 and Ti = {1, 2, . . . , n}. 

2) In the k-th round, allocate server »S'j(fc to Qi(k,t)^ where 

i{k,t) = mm{i \ i £ ^k{t)}, 

j{k,t) = mm{j G Tfc(t) | Ci^k,t),j{t) = !}■ 
That is, in the A;-th round, we consider the queues that have the largest HOL delay among those 
that have at least one available server connected (i.e., the queues in set ^'^(t)), and break ties by 
picking the queue with the smallest index (i.e., Qi(^k,t))- We then choose an available server that are 
connected to queue Qi[k,t)^ break ties by picking the server with the smallest index (i.e., server 
Sj{k,t))' to serve Qi(^k,t)- At the end of the A;-th round, update the length of Qii^k,t) to account 
for service, i.e., set Q\k,t)ii) = (Qi[k%i^) - Ci(^k,t),j{k,t){t)y and Q^{t) = Qt^t) for all 
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i / i{k,t). Also, update the HOL delay of Qi(^k,t), by setting = Z^(^k,t),i(^) = ^i{kj),2(^) 

if Qi^k,t)(.^) > 0, and = otherwise, and setting Wl'it) = W^-^it) for all i / i{k,t). 

3) Stop if k equals n. Otherwise, increase k by 1, set Tfc(t) = T/c„i(t)\{j(fc, t)}, and repeat step 2. 

Remark: D-QSG has a complexity O(n^), since there are at most n rounds, and in each round, it takes 
0(n^ + n) = 0{n?) time to find a queue that has at least one connected and available server (which 
takes 0{n?) time to check for all queues) and that has the largest HOL delay (which takes 0(n) time 
to compare). It should be noted that in each round, when there are multiple queues that have the largest 
HOL delay, D-QSG chooses the queue with the smallest index; when there are multiple available servers 
that are connected to the chosen queue, D-QSG allocates the server with the smallest index. We specify 
such a tie-breaking rule for ease of analysis. In practice, we can also break ties arbitrarily. 

B. Near-optimal Delay Performance 

In this section, we present the main result of this paper on near-optimal rate-function. We first define 
near-optimal rate-function, and then evaluate the delay performance of D-QSG. 

A policy P is said to achieve near-optimal rate-function if the delay rate-function 1(6) attained by 
policy P for any fixed integer threshold 6 > 0, is no smaller than I*{h — 1), the optimal rate-function 
for threshold 6 — 1. That is, 

7(6) = liminf — logP(M^(0) > 6) > r(6- 1). (3) 

We next present our main result in the following theorem, which states that D-QSG achieves a near- 
optimal rate-function. 

Theorem 2: Under Assumptions |2] and [3l D-QSG achieves a near-optimal rate-function, as given in 
©. 

We prove Theorem[2]by the following strategy: 1) motivated by a key property of D-QSG (Lemma[T]), 
we propose the Greedy Frame-Based Scheduling (G-FBS) policy, which is a variant of the FBS policy 
in Q that has been shown to be rate-function delay-optimal in many cases; 2) show that G-FBS 
achieves a near-optimal rate-function (Theorem[3]); 3) prove a dominance property of D-QSG over G-FBS. 
Specifically, in Lemma |2l we show that for any given sample path, by the end of each time-slot, D-QSG 
has served every packet that G-FBS has served. Note that Theorem \2\ holds for D-QSG with any tie- 
breaking rules, under which, when allocating a server to a queue, it does not account for the connectivity 
between this server and the other queues. The performance of D-QSG may be further improved, if a 
better tie-breaking rule is applied. 
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We now present a crucial property of D-QSG in Lemma [T] which is the key to proving the rate-function 
near-optimality for G-FBS and D-QSG. 

Lemma 1: Consider any n packets and any strictly increasing function < ^. Suppose that D- 
QSG is applied to schedule these n packets. Then, there exists a finite integer Nx > such that for 
all n > Nx, with probabiUty no smaller than 1 — 2(1 — D-QSG schedules at least n — 

packets, including the oldest f{n) packets. 

We provide the proof of Lemma [T] in Appendix ICl and explain the importance of Lemma [T] as follows. 
We first recall how DWM is shown to be rate-function delay-optimal in I?!. Specifically, the authors of 
iH compare DWM with another policy FBS. In FBS, packets are filled into frames with size n — H in a 
FCFS manner, where H is a. suitably chosen constant independent of n. The FBS policy attempts to serve 
the entire HOL frame whenever possible. The authors of 171 first establish the rate-function optimality 
of the FBS policy. Then, by showing that DWM dominates FBS (i.e., DWM will serve the same packets 
in the entire HOL frame whenever possible), the delay optimality of DWM then follows. 

However, this comparison approach will not work directly for D-QSG. In order to serve all packets 
in a frame whenever possible, one would need certain back-tracking (or rematching) operations as in a 
typical maximum-weight matching algorithm like DWM. For a simple greedy algorithm like D-QSG that 
does not do back-tracking, it is unlikely to attain the same probability of serving the entire frame. In 
fact, even if we reduce the maximum frame size to n — 2^/n, we are still unable to show that D-QSG 
can serve the entire frame with a sufficiently high probability. Thus, we cannot compare D-QSG with 
FBS as in 0. 

Fortunately, Lemma [T] provides an alternate avenue. Specifically, for a frame of size n, even though 
D-QSG may not serve any given subset of n — 2^/n packets with a sufficiently high probability, it will 
serve some subset of n — 2^/n packets with a sufficiently high probability. Further, this subset must 
contain the oldest 2^/n packets for a large n, if we choose /(n) in Lemma [T] such that /(n) G Lo{^/n). 
Note that D-QSG still leaves (at most) 2^/n packets to the next time-slot. In the next time-slot, if we can 
make sure that D-QSG serves all of these 2^/n leftover packets, which also happen to be the oldest, we 
would then at worst suffer an additional one-time-slot delay. Intuitively, we would then be able to show 
that D-QSG attains a near-optimal delay rate-function. 

To make this argument rigorous, we next compare D-QSG with a new policy called Greedy Frame- 
Based Scheduling (G-FBS). Note that G-FBS is only for assisting our analysis, and will not be used as 
an actual scheduling algorithm. In the G-FBS policy, packets are grouped into frames. Each frame has a 
capacity of no = n — 2y^ packets, i.e., at most no packets can be filled into a frame. As packets arrive 
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to the system in each time-slot, the frames are created by fiUing the packets sequentially. Specifically, 
packets that arrive earlier are filled into the frame with a higher priority, and packets from queues with a 
smaller index are filled with a higher priority when multiple packets arrive in the same time-slot. Once 
the current frame is fully filled, it will be closed and a new frame will be open. We also assume that 
there is a "leftover" frame, called L-frame for simplicity, with a capacity of 2y/n packets. The L-frame 
is for storing the packets that are not served in the previous time-slot and are carried over to the current 
time-slot. At the beginning of each time-slot, we combine the HOL frame and the L-frame into a "super" 
frame, called S-frame for simplicity, with a capacity of n packets. If there are less than n packets in the 
S-frame, we can artificially add some dummy packets with a delay of zero at the end of the S-frame 
so that the S-frame is fully filled. In each time-slot, G-FBS runs the D-QSG policy, but restricted to 
only the n packets of the S-frame. We call it a success, if D-QSG can schedule at least no packets, 
including the oldest f{n) packets, from the S-frame, where /(n) < § is any function that satisfies that 
fin) G o(n) and /(n) G uJ{^/n). In each time-slot, if a success does not occur, then no packets will be 
served. When there is a success, the G-FBS policy serves all the packets that are scheduled by D-QSG 
restricted to the S-frame in that time-slot. Lemma [7] implies that in each time-slot, a success occurs with 
probability at least 1 — 2(1 — g)"~2/(n) there is a success, all packets from the S-frame, except 

for at most 2^/n = n — uq packets, are successfully served, and these served packets include the oldest 
f{n) packets. The packets that are not served will be stored in the L-frame, and carried over to the next 
time-slot (except for the dummy packets, which will be discarded.) 

Remark: Although G-FBS is similar to FBS policy Q, it exhibits a key difference from FBS. In the 
FBS policy, in each time-slot, either an entire frame (i.e., all the packets in the frame) will be completely 
served or none of its packets will be served. Hence, it does not allow packets to be carried over to the 
next frame. In contrast, G-FBS allows leftover packets and is thus more flexible in serving frames. This 
property is the key reason that we can use a lower-complexity policy (like D-QSG). On the other hand, 
it leads to a small gap between the rate-functions achieved by G-FBS and delay-optimal policies (e.g., 
FBS and DWM). Nonetheless, this gap can be well characterized by using Lemma [T] Specifically, in the 
G-FBS policy, an L-frame contains at most 2^/n packets, because at most 2^/n packets are not served 
whenever there is a success. Further, these (at most) 2^/n leftover packets will be among the oldest /(n) 
packets (in the S-frame) in the next time-slot when n is large, due to our choice of f{n) G Lo{^/n). 
Hence, another success will serve all the leftover packets. This implies that at most x + 1 successes are 
needed to completely serve x frames, for any finite integer x > 0. In fact, this property is the key reason 
for a one-time-slot shift in the guaranteed rate-fimction by G-FBS, which leads to the near-optimal delay 
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performance, as we show in the following theorem. 

Theorem 3: Under Assumptions |2] and [21 G-FBS policy achieves a near-optimal rate-function, as given 
in ©. 

The proof of Theorem [3] follows a similar line of argument as in the proof for rate-function delay 
optimality of FBS (Theorem 2 in Q). We consider all the events that lead to the delay-violation event 
{Vr(0) > b}, which can be caused by two factors: bursty anivals and sluggish service. On the one hand, 
if there are a large number of arrivals in certain period, say of length t time-slots, which exceeds the 
maximum number of packets that can be served in a period of t+b+1 time-slots, then it unavoidably leads 
to a delay violation. On the other hand, suppose that there is at least one packet arrival at certain time, 
and that under G-FBS, a success does not occur in any of the following 6+1 time-slots (including the 
time-slot when the packet arrives), then it also leads to a delay violation. Each of these two possibilities 
has a corresponding rate-function for its probability of occurring. Large-deviations theory then tells us that 
the rate-function for delay violation is determined by the smallest rate-function among these possibilities 
(i.e., "rare events occur in the most likely way".) We can then show that I{b) > Iu{b — 1) > I*{b — 1) 
for any integer 6 > 0, where /(•) is the rate-function attained by G-FBS, /[/(") is the upper bound that 
we derived in Section [Till and /*(•) is the optimal rate-function, respectively. We provide the detailed 
proof of Theorem [3] in Appendix |D] 

Remark: Note that the gap between the optimal rate-function and the above near-optimal rate-function 
is likely to be quite small. For example, in the special case of i.i.d. 0-1 arrivals, the near-optimal rate- 
function implies that I{b) > j^Iij{b) > ^/*(6), since we can compute that Iu{b) = {b + 1) log 
for this special case. 

Finally, we make use of the following dominance property of D-QSG over G-FBS. 
Lemma 2: For any given sample path, by the end of any time-slot, D-QSG has served every packet 
that G-FBS has served. 

We prove Lemma [2] by induction, and provide the proof in Appendix 10 Then, the near-optimal rate- 
function of D-QSG (Theorem ^ follows immediately from Lemma [2] and Theorem [3l 

C. Throughput Optimality 

In this section, we establish throughput optimality of D-QSG. Note that the rate-function is studied 
in the asymptotic regime, i.e., when n goes to infinity. Hence, even if the convergence rate of the rate- 
function is fast, the throughput performance may be poor for small to moderate values of n. As a matter 
of fact, a rate-function delay-optimal policy may not even be throughput-optimal for a fixed n (e.g.. 
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FBS). To this end, we are also interested in the throughput performance of scheduling policies in general 
non-asymptotic regime (i.e., in a multi-channel system with any fixed value of n.) 

It is well-known that the Max Weight Scheduling policy El, l[T2l -| [T4l that maximizes the weighted 
sum of the rates (where the weight is either queue length or delay) is throughput-optimal in very general 
settings, including the multi-channel system that we consider in this paper. Hence, we first discuss a 
simple extension of the Delay-based MaxWeight ScheduUng (D-MWS) policy lEl, S, US, IHl for our 
multi-channel system. 

Let Sj{t) denote the set of queues that are connected to server Sj in time-slot t, i.e., Sj{t) = {1 < 
i < n \ Cij{t) = 1}, and let Tj^t) denote the subset of queues in Sj{t) that have the largest HOL delay 
in time-slot t, i.e., Tj{t) = {i G •Sj{t) \ Wi{t) = max;g5^(t) Wi{t)}. We then specify the operations of 
D-MWS as follows. 

Delay-based MaxWeight Scheduling (D-MWS) poUcy: In each time-slot t, the scheduler allocates server 
Sj to serve queue Qi(j^t) such that i{j,t) = m.m{i \ i S Tj{t)}. That is, each server chooses to serve a 
connected queue that has the largest HOL delay, breaking ties by picking the queue with the smallest 
index when there are multiple such queues. 

Remarks: We can prove throughput-optimality of D-MWS in our multi-channel system, using the fluid 
limit techniques along with the same line of analysis in f9l for a single-channel system. The key insight 
we obtained from the proof in Q is that to achieve throughput optimality, it is sufficient for each server 
to serve a connected queue that has the largest weight in the fluid limits rather than in the original system. 

Using the insight obtained above, we next show that D-QSG is throughput-optimal in general non- 
asymptotic settings (for a system with any fixed n). 

Theorem 4: D-QSG policy is throughput-optimal under Assumption [T] 

We prove Theorem |4] using the fluid limit techniques ||9l, ifTSl . Different from D-MWS policy under 
which, each server chooses to serve a connected queue with the largest HOL delay, D-QSG allocates 
servers to serve the oldest packets first one -by-one in an iterative manner. Hence, we can show that the 
operations of D-QSG guarantees that each server chooses a connected queue that has a large enough 
weight, and that in the fluid limits the weight of the queue chosen by each server is equal to that of the 
queue chosen under D-MWS. Then, we complete the proof of Theorem |4l following a similar line of 
analysis as in |9|. We provide the detailed proof in Appendix IF] 

So far, we have shown that D-QSG not only achieves a near-optimal rate-function, but also guarantees 
throughput optimality, with a lower complexity O(n^) than that of DWM. Interestingly, we will show next 
that just by switching the order of examining the servers or the queues first, we can obtain another policy 



December 10, 2012 



DRAFT 



15 



that not only achieves the same performance of throughput optimaUty and rate-function near-optimality 
as that of D-QSG, but also incurs an even lower complexity O(n^). 

V. Delay-based Server-Side-Greedy (D-SSG) 

In this section, we develop another greedy scheduling policy called Delay-based Server-Side-Greedy 
(D-SSG), under which each server iteratively chooses to serve a connected queue that has the largest HOL 
delay. We show that D-SSG is equivalent to D-QSG under certain tie-breaking rules, in the sample-path 
sense, and thus achieves the same performance of throughput optimality and rate-function near-optimality 
as that of D-QSG. Further, D-SSG has an even lower complexity 0{'n?). 

Before we describe the detailed operations of D-SSG, we would like remark on D-MWS due to the 
similarity between D-MWS and D-SSG. Note that D-MWS is not only throughput-optimal, but also has 
a low complexity 0{n?). However, we can show that D-MWS suffers from poor delay performance. 
Specifically, following a similar line of argument as in the proof of Theorem 3 in |3], we can show 
that D-MWS yields a zero rate-function in certain scenarios (e.g., with i.i.d. 0-1 arrivals). We omit the 
proof here, and explain the intuition behind it as follows. Under D-MWS, each server chooses to serve 
a connected queue that has the largest HOL delay without accounting for the decisions of the other 
servers. This way of allocating servers leads to an unbalanced schedule. That is, only a small fraction of 
the queues get served in each time-slot. This inefficiency essentially leads to poor delay performance. 

Now, we describe the operations of our proposed D-SSG policy. D-SSG is similar to D-MWS, in 
the sense that it also allocates each server to serve a connected queue that has the largest HOL delay. 
However, there is a key difference. That is, instead of allocating the servers all at once as in D-MWS, 
D-SSG allocates the servers one-by-one, accounting for the scheduling decisions of the servers that are 
allocated earlier. We will show that this critical difference results in a substantial improvement in the 
delay performance. 

We present some additional notations, and then specify the detailed operations of D-SSG. In each time- 
slot, there are n rounds, and in each round, one of the remaining servers is allocated. Let Q^{t), Zfi{t) and 
W^{t) = Z^^{t) denote the length of queue Qi, the delay of the /-th packet of Qi, and the HOL delay of 
Qi after A; > 1 rounds of server allocation in time-slot t, respectively. In particular, we have Q^{t) = Qi{t), 
Zl^{t) = Zi^i{t), and Wf{t) = Wi{t). Recall that Sj{t) = {1 < i < n | C^j{t) = 1}. Let T^j{t) denote the 
set of indices of the queues that are connected to server Sj in time-slot t and have the largest HOL delay at 
the beginning of the A:-th round in time-slot t, i.e., T^-{t) = {i G Sj{t) \ W^~'^{t) = mSiXi^s^^t) 
Let t) denote the index of queue that is served by server Sj in time-slot t under D-SSG. 
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Delay-based Server-Side-Greedy (D-SSG) policy: In each time-slot t, 

1) Initialize k = 1. 

2) In the k-th round, allocate server Sk to serve queue Qi(^k,t)^ where i{k,t) = min{i | i G r^(t)}. 
That is, in the A;-th round, server Sk is allocated to serve the connected queue that has the largest 
HOL delay, breaking ties by picking the queue with the smallest index if there are multiple 
such queues. Then, update the length of Qii^k.t) to account for service, i.e., set Q^^^t)^^) ~ 
{Qiik%(*) - Ci{k,t),k{^)Y and Q'y{t) = Qi~^{t) for all i / i{k,t). Also, update the HOL delay 
of Q,^k,t) to account for service, i.e., set = = ^f(fc]),2(*) if Qiik,t)i*) > 0' 
and W^^fc^f)(t) = otherwise, and set Wl^{t) = Wl'-\t) for all i / i{k,t). 

3) Stop if k equals n. Otherwise, increase A; by 1 and repeat step 2. 

Remark: Note that both D-SSG and D-QSG aim to allocate each server to a queue with the largest 
HOL delay. The key difference between D-SSG and D-QSG is that D-SSG iterates over the servers first 
while D-QSG iterates over the packets/queues first. This key difference leads to the fact that D-SSG is 
simpler to implement and has an even lower complexity 0{n?). Specifically, there are n rounds, and in 
each round, it takes at most n times for a server to find a connected queue with the largest HOL delay. 

It should be noted that the queue-length-based counterpart of D-SSG, called Q-SSG, has been studied 
in 131, H. Under Q-SSG, each server iteratively chooses to serve a connected queue that has the largest 
length. It has been shown that Q-SSG not only achieves throughput optimality, but also guarantees a 
positive (queue-length) rate-function. However, their results have the following limitations: 1) a positive 
rate-function may not be good enough, since the gap between the guaranteed positive rate-function and 
the optimal is unclear; 2) good queue-length performance does not necessarily translate into good delay 
performance; 3) their analysis was only carried out for restricted arrival processes that are not only i.i.d. 
across users, but also in time. In contrast, in the following theorem, we show that D-SSG achieves a 
rate-function that is not only positive but also near-optimal (in the sense of ^) for more general arrival 
processes, while guaranteeing throughput optimality. 

Theorem 5: D-SSG policy is throughput-optimal under Assumption [T] and achieves a near-optimal 
rate-function as given in ([3]) under Assumptions [2] and [3l 

Theorem |5] follows immediately from the following lemma, which states that D-SSG is equivalent to 
D-QSG under the tie-breaking rules specified in this paper. 

Lemma 3: For the same sample path, i.e., same realizations of arrivals and channel connectivity, D- 
QSG and D-SSG pick the same schedule in every time-slot. 

We prove Lemma [3] by induction, and provide the proof in Appendix |Gl Note that under D-SSG, in 
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each round, when a server has multiple connected queues that have the largest HOL delay, we break ties 
by picking the queue with the smallest index. Presumably, one can take other arbitrary tie-breaking rules. 
However, it turns out that directly analyzing the rate-function for a greedy policy from the server side 
(like D-SSG) is much more difficult than that for a greedy policy from the queue side (like D-QSG). For 
example, as we mentioned earlier, the authors of Q, H were only able to prove a positive (queue-length) 
rate-function for Q-SSG in more restricted scenarios. Hence, our choice of the above simple tie-breaking 
rule is in fact quite important to leading to the equivalence property in Lemma [3l which plays a key 
role in proving the rate-function near-optimality for D-SSG. Nevertheless, we would expect that one can 
choose arbitrary tie-breaking rules in practice. 

So far, we have shown that our proposed low-complexity greedy policies achieve both throughput 
optimality and rate-function near-optimality. In the next section, we will show through simulations that 
these greedy policies not only exhibit a near-optimal rate-function, but also empirically are virtually 
indistinguishable from the delay-optimal policy DWM in many scenarios. 

VI. Simulation Results 

In this section, we conduct simulations to compare scheduling performance of our proposed greedy 
policies with DWM, D-MWS, and Q-SSG. We simulate these policies in Java and compare the empirical 
probabilities that the largest HOL delay in the system in any given time-slot exceeds an integer threshold 
b, i.e., P{W{0) > b). 

For the arrival processes, we consider bursty arrivals that are driven by a two-state Markov chain 
and that are correlated over time. (We obtained similar results for i.i.d. arrivals, and do not report them 
here due to space constraints.) We adopt the same parameter settings as in Q. For each user, there are 
5 packet-arrivals when the Markov chain is in state 1, and there is no arrivals when it is in state 2. 
The transition probability of the Markov chain is given by the matrix [0.5, 0.5; 0.1, 0.9], and the state 
transitions occur at the end of each time-slot. The arrivals for each user are correlated over time, but 
they are independent across users. For the channel model, we first assume i.i.d. ON-OFF channels with 
unit capacity, and set q = 0.75. We later consider more general scenarios with heterogeneous users and 
bursty channels that are correlated over time. We run simulations for a system with n servers and n 
users, where n G {10, 20, ... , 100}. The simulation period lasts for 10'' time-slots for each policy and 
each system. 

The results are summarized in Fig. [2l where the complexity of each policy is also labeled. In order to 
compare the rate-function I{b) as defined in Eq. we plot the probability over the number of channels 
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or users, i.e., n, for a fixed value of tliresliold b. The negative of the slopes of the curves can be viewed 
as the rate-function for each policy. In Fig. |2l we report the results only for 6 = 4, and the results are 
similar for other values of threshold b. From Fig. |2j we observe that both D-QSG and D-SSG are virtually 
indistinguishable from DWM, which is known to be rate-function delay-optimal. This not only supports 
our theoretical results that both D-QSG and D-SSG guarantee a near-optimal rate-function, but also 
implies that both D-QSG and D-SSG empirically perform very well while enjoying a lower complexity. 
Further, we observe that D-SSG consistently outperforms its queue-length-based counterpart, Q-SSG, 
despite the fact that in [31, it has been shown through simulations that Q-SSG empirically achieves near- 
optimal queue-length performance. This provides a further evidence that good queue-length performance 
does not necessarily translate into good delay performance. The results also show that D-MWS yields a 
zero rate-function, as expected. 

Further, we evaluate scheduling performance of different poUcies in more realistic scenarios, where 
users are heterogeneous and channels are correlated over time. Specifically, we consider channels that 
can be modeled as a two-state Markov chain, where the channel is "ON" when the Markov chain is in 
state 1, and is "OFF" when it is in state 2. This type of channel model can be viewed as a special case 
of the Gilbert Elliot model that is widely used for describing bursty channels. We assume that there are 
two classes of users: users with an odd index are called near-users, and users with an even index are 
c&WeA far-users. Different classes of users see different channel conditions: near-users see better channel 
condition, and far-users see worse channel condition. We assume that the transition probability matrices of 
channels for near-users and far-users are [0.833, 0.167; 0.5, 0.5] and [0.5, 0.5; 0.167, 0.833], respectively. 
The arrival processes are assumed to be the same as in the previous case. 

The results are summarized in Fig. [3l We observe similar results as in the previous case with homo- 
geneous users and i.i.d. channels in time. In particular, D-QSG and D-SSG exhibit a rate-function that is 
the same as that of DWM, although their delay performance is slightly worse. Note that in this scenario, 
a rate-function delay-optimal policy is not known yet. Hence, for future work, it would be interesting to 
understand how to design rate-function delay-optimal or near-optimal policies in general scenarios. 

VII. Conclusion 

In this paper, we developed low-complexity greedy scheduling policies that not only achieve throughput 
optimality, but also guarantee a near-optimal delay rate-function, for multi-channel wireless networks. Our 
studies reveal that throughput optimality is relatively easier to achieve in such multi-channel systems, 
while there exists an explicit trade-off between complexity and delay performance. If one can bear a 
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n (number of users or channels) 



Fig. 2. Performance comparison of different scheduling policies in the case with homogeneous i.i.d. channels, for 6 = 4. 




20 40 60 80 100 

n (number of users or channels) 



Fig. 3. Performance comparison of different scheduling policies in the case with heterogeneous users and Markov-chain driven 
channels, for 6 = 4. 



minimal drop in the delay performance, lower-complexity scheduling policies can be exploited. 

For future work, it would be interesting to explore whether one can find low-complexity scheduling 
policies that can guarantee both throughput and delay optimality. Further, it is still unclear how to design 
scheduling policies (even with a high complexity) that can guarantee optimal or near-optimal delay 
performance in more realistic scenarios. Therefore, it is important to investigate the scheduling problem 
in such multi-channel systems with more general models, e.g., accounting for multi-rate channels that 
are correlated over time, instead of i.i.d. ON-OFF channels, as well as heterogeneous users and channels 
with different statistics. 



December 10, 2012 



DRAFT 



20 



Appendix A 
The Optimal Throughput Region A* 

We can characterize the optimal throughput region A* of our multi-channel systems in a similar manner 
to that for single-channel systems as in |[9l. 

We start with discussions for a single-channel system with n users in a more general scenario. 
Specifically, suppose that there is a finite set Ai = {1, 2, . . . , |7W|} of global server states (where the 
server state accounts for the state of the links between the server and all users). For each state m £ A4, 
there is an associated service rate vector r'" = [r™, I < i < n], where r™ is the maximum number of 
packets that can be transmitted to Qi when the server is in state ?n (for i.i.d. ON-OFF channels that we 
consider in this paper, we have rf^ G {0, 1} for all m and i). We assume that the random channel state 
process is an irreducible discrete-time Markov chain with state space A4. We let vr = [-Km,iTi G M] 
denote the (unique) stationary distribution of this Markov chain, where -Km > for all m G A^. 

As in lO, consider a Static Service Split (SSS) policy, associated with an \A4\ x n stochastic matrix 
= [4'm,i,fn G Ai, 1 < i < n], where (j)m,i > for all m and i, and X]i<j<n'?^m,j = 1 for every m. 
Under the SSS policy, the server chooses to serve Qi with probability cl)m,i when the server is in state m. 
Clearly, the (long-term average) service rate vector can be represented hy u = [ui,l < i < n] = v{(t)), 
where vi = XlmeA^ T^m4'm,i'r™' ■ Then, the set of all feasible (long-term average) service rate vector can 
be represented as 

TZ = {v \ V = u{cl)) for some stochastic matrices cj)}. 

Hence, the optimal throughput region can be represented as 

A* = {X \ X < u for some vector u G TZ}. 

Now, consider a multi-channel system with n orthogonal channels. Let TZj denote the set of all feasible 
(long-term average) service rate vector for server Sj, and let TZ = IZi x IZ2 x • • • x Tin denote the set 
of all feasible service rate matrices /i (where the dimension of ^ is n x n, and is a feasible service 
rate allocated to Qi from server Sj). Hence, the optimal throughput region of the multi-channel system 
can be represented as 

n 

A* = {A I Aj < n-ij for all i, for some matrix ji G TZ]. 

An arrival rate vector A is strictly inside A*, if the above inequalities are all strict. 

Note that our multi-channel system with i.i.d. ON-OFF channel model is a special case of the above 
scenario. 
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Appendix B 
Proof of Theorem [H 

We consider event 8i and a sequence of events £2 implying that W{{)) > b. 

Event £1: Suppose that there is a packet that arrives to the network in time-slot —b — 1. Without loss 
of generality, we assume that the packet arrives to queue Qi. Further, suppose that Qi is disconnected 
from all the n servers in all the time-slots from —b— 1 to —1. 

Then, at the beginning of time-slot 0, this packet is still in the network and has a delay of 6+ 1. This 
implies £1 C {H^(0) > b}. Note that the probability that event £1 occurs can be computed as 

P{£^) = (1 - = e-n{t'+i)/x_ 

Hence, we have 

P{W{0) >b)> e-"(''+i)^-, 

and thus 

limsup — log P{W{0) >b) <{b + l)Ix. 

n— >oo IT' 

Event £2- Consider any fixed c G {0, 1, . . . , b}. Fix any e > 0, and choose t such that lAcit, b — c) < 
lAG{t> — c) + e. Suppose that from time-slot —t — b to —b — 1, the total number of packet arrivals to the 
system is greater than nt + n{b — c), and let P{b^c) denote the probability that this event occurs. Then, 
from the definitions of /AG(t,x) and lAcit)^ we know 

liminf — logp,f,-c) = lAcit, b - c) < lAcib - c) + e. 

n— >-oo n ^ ' 

Clearly, the total number of packets that are served in any time-slot is no greater than n. Hence, at the 
end of time-slot —b — 1, there are at least n{b — c) + 1 packets remaining in the system. Moreover, at 
the end of time-slot — c — 1, the system contains at least one packet that arrived before time-slot —b. 
Without loss of generality, we assume that this packet is in Qi. Now, assume that Qi is disconnected 
from all the n servers in the next c time-slots, i.e., from time-slot — c to —1. This occurs with probability 
(1 — qY^ = e""'^^'-', independently of all the past history. Hence, at the beginning of time-slot 0, there 
is still a packet that arrived before time-slot —b. Hence, we have W{0) > 6 in this case. This implies 
£2 C {VF(0) > b}. Note that the probability that event £2 occurs can be computed as 

Hence, we have 

P(W^(0) >&) >p(fe„,)e-"^^-, 
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and thus 

limsup — log PiW{0) >b)< lAcib - c) + e + clx- 

n—>ca ITt 

Since the above inequaUty holds for any c € {0, 1, . . . , 6} and all e > 0, by letting e tend to and taking 
the minimum over all c G {0, 1, . . . , b}, we have 

limsup — log P{W{0) > b) < min {lAcib - c) + clx}- 

n->oo ce{0,l,...,b} 

Considering both event £i and events <5|, we have 

limsup — log P{W{0) > b) < min{ min {/ag(& - c) + clx}, (b + 

n— >oo n ce{0,l,...,6} 

Appendix C 
Proof of Lemma [2 

We divide the proof into two parts (Lemmas |4] and [5]). 

Lemma 4: Consider any function f{n) < ^, which is strictly increasing with n. Then, there exists a 
finite integer Nxi > such that for all n > Nxi, with probability no smaller than 1 — (1 — q)"-~'^f("'\ 
D-QSG schedules all the oldest /(n) packets in the system. 

Proof: Let Xk denote the A:-th oldest packet in the system. And let Vi^ = {x^ | r G {1, 2, . . . , k}} 
denote the set of the oldest k packets, where /c G {1, 2, . . . }, and in particular, set Vq = 0. 

From the operations of D-QSG, it is easy to see that 

P (Packet Xfc is scheduled | All the packets in set Vk~i are scheduled) 

= 1 - (1 - 

Then, we choose Nxi such that /(n) < (jz:^)-^^"'^^^ for all n > Nxi, and show that for all n > Nxi, 
with probability no smaller than 1 — (1 — qr)"~2/(n)^ jj^g packets in set ^^/(n) all the oldest /(n) 
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packets) are scheduled: 

P(A11 the packets in set 'P/(n) packets are scheduled) 

fin) 

= P (Packet Xk is scheduled | 

k=l 

All the packets in set Vk-i packets are scheduled) 

fin) 

= n (i - (1 - ^r-'^') 

k=l 

> (i_(i_g)-/W+i)^^"^ 

> l-/(n)(l-g)"-/W+i 

where (a) is from Bernoulli's inequality, and (b) is from our choice of Nxi- ■ 
Lemma 5: Consider a frame consisting of n packets. The D-QSG policy is applied to schedule the 

packets in the frame. Then, there exists a finite integer Nx2 > such that for all n > Nx2, with 

probability no smaller than 1 — (1 — g)", D-QSG schedules at least 7i — 2^fn packets from the frame. 
Proof: Let denote the fc-th oldest packet in the system. And let Vk — {xr | r G {1, 2, . . . , fc}} 

denote the set of the oldest k packets, where A; G {1, 2, ... , n}, and in particular, set Vq = 0. 

Consider a set of arbitrarily 2^/n packets in the frame, denoted by H = {xri,Xr2, ■ ■ ■ j^^ra^}' where 
< rj if i < j. It is easy to see that there are at least 2y^ — i packets that are younger than packet 

Xj... Then, we must have 

< n- (2Vn-i), (4) 

for alH E {1, 2, . . . , 2-^/ri}. Let = {x^ , x^^, . . . , Xr^} C H denote the subset of packets in E that are 
no younger than Xr^, and clearly, we have {Vr.-iX^i-il = fi — i. From the operations of D-QSG, for all 
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i e {1,2, . . . , 2-y/n}, we have 

P (Packet is not scheduled | 

All the packets in set are not scheduled 
and r packets in set Vn-iX^i-i are scheduled) 

= (1 - qr-^ 

< (i-g)"-l^'-.-A2.-il 

for all r G {0, 1, . . . , \Vr,-i\'^i-i\}- Then, we have 

P (Packet is not scheduled | All the packets in set are not scheduled) 

= (P (Packet Xn is not scheduled | 

r=0 

All the packets in set are not scheduled 

and r packets in the set Pj-.-iV^i-i are scheduled)- 

P(r packets in the set Vr^-iX^i-i are scheduled)) 



and thus, 



\n—ri+i 



= (1- 

where the last inequality is from 



P(A11 the packets in set H are not scheduled) 
= Y[ P (Packet Xr- is not scheduled | 

i=l 

All the packets in set are not scheduled) 

< n(i-9)"-^'+* 

i=l 

< Y\{^ ~ (;)"~(""^v^+*)+' 

i=l 

\4n 



5 
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Now, we choose Nx2 such that < (jr^)^" for all n > Nx2, and show that for all n > Nx2, 

with probability no greater than (1 — g)", some 2y/n packets are not scheduled: 

P(Some 2y/n packets are not scheduled) 



n 



< P(A11 the packets in certain set H are not scheduled) 

< n2v^(l - g)^" 
<(!-?)", 

where the last inequality is due to the choice of A^js:2- 
Therefore, we have 

P(At least n — 2\fn packets are scheduled) 

= 1 — P(Less than n — 2\fn packets are scheduled) 
= 1 — P (Greater than 2^fn packets are not scheduled) 
> 1 — P(At least 2^fn packets are not scheduled) 
= 1 — P(Some 2\/n packets are not scheduled) 
>l-(l-g)", 

for all n > Nx2- ■ 
By applying Lemmas |4] and [51 and choosing Nx — maxjA'xi, -^^X2}> we show that for all n > Nx, 
with probability no smaller than 1 — 2(1 — g)"~^-^("\ D-QSG schedules at least n — 2y^ packets including 
the oldest f{n) packets from the frame. 

Appendix D 
Proof of Theorem [3] 

The proof follows a similar argument for the proof of Theorem 2 in Q. 

For ease of analysis, in the G-FBS policy, we choose any fixed real number p G (p, 1)> and consider 
the arrival process A(-), by adding extra dummy arrivals to the original anival process A{-). The resulting 
arrival process A{-) is simple, and has the following property: 

{pn, if A(t) < pn, 
Ln, if A{t) > pn. 

For notational ease, we use A{-) to denote the modified arrival process A{-) throughout the proof of 
Theorem [3] 
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We start by defining the following notions associated with the G-FBS policy. Let F{t) denote the 
number of unserved frames in time-slot t, and let R{t) denote the remaining available space (where the 
unit is packet) in the end-of-line (partially-filled) frame at the end of time-slot t. Also, let Xp{t) denote 
the indicator function of whether a success occurs in time-slot t. That is, Xpit) = 1 if there is a success, 
and Xp{t) = otherwise. Then, we can write a recursive equation for F{t): 

'A{t)-R{t-1) 



F{t) =(F{t-l) + 



XF{t),0] , (6) 



no 

R{t) = l{Fit)>o} ■ {{R{t - I) - A{t)) mod no). (7) 

Let M{t) < l^fn denote the number of packets in the L-frame at the beginning of time-slot t, and 
let -ff(t) < denote the number of packets in the HOL frame at the beginning of time-slot t. Then, 
at the beginning of time-slot t, the number of packets in the S-frame is equal to M(t) + FL(i). Let 
F){i) < A/(t — 1) + Hit — 1) denote the number of packets served from the S-frame if a success occurs 
in time-slot t. Then, we have the following recursive equation for M{t): 

M(t - 1) + ii{t - 1) - D{t - 1), if Xpit - 1) = 1, 
M{t - 1), otherwise. 



M{t) 
Also, we let 



Xpitx^t-}) = ^ Xir(r)l{|ir(^)>o}u{M(T)>0}} 

T=ti 

denote the the total number of successes in the interval from time-slot ti to t2 when the S-frame is 
non-empty (i.e., the number of unserved frames is greater than zero or the L-frame is non-empty). 

Note that the arriving time of a frame is the time when its first packet arrives. Let Rq = R{ti — 1) 
denote the empty space in the end-of-line frame at the end of time-slot ti — 1. Then, we let Ap''{ti,t2) 
denote the number of new frames that arrive from time-slot ti to t2- When Rq = 0, we use Ap{ti,t2) 
to denote Ap°{ti,t2) for notational convenience. 

Let L{—h) be the last time before —b, when the number of unserved frames is equal to zero. Then, 
given that L{—h) = —t — b—1, where t > 0, the number of unserved frames never becomes zero during 
interval [—t — h,—h — 1]. Let f7(0) denote the indicator function of whether at time-slot the L-frame 
contains a packet that arrives before time-slot —b, i.e. C/(0) = 1 if at time-slot the L-frame contains 
a packet that arrives before time-slot —h, and C/(0) = 0, otherwise. Let denote the event that the 
number of frames that arrive during interval [—t — b,—b — 1] is greater than the total number of successes 
during interval [—t — — 1] when the S-frame is non-empty, i.e., 

1 = {Api-t -b,-b-l)> Xpi-t - b, -1)}, 
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and let <5°^ denote the event that the number of frames that arrive during interval [—t — b, —b — 1] is 
equal to the total number of successes during interval [—t — 6,-1] when the S-frame is non-empty, and 
at time-slot the L-frame contains a packet that arrives before time-slot —b, i.e., 

= {Api-t -b,-b-l)= Xpi-t - b, -1), U{0) = 1}. 

Letting £^ = U we have 

{L{-b) = -t-b-l, W{0) > b} 

(8) 

= {L{-b) = -t-b-l,£n. 
By taking the union over all possible values of L{—b) and applying the union bound, we have 

P{W{0) > b) 

oo 

<j2nL{-b) = -t-b-i,£n- 
t=i 

We fix a finite time t*, whose value will later be specified in ([TSll . Then, we split the summation as 

P(T^(0)>6)<Pi + P2, 

where 

f 

p,AY^P(L{-b) = -t-b-i,£n, 

t=l 

oo 



p,Aj2nL{-b) = -t-b-i,£n. 



t=f 



We now fix /q — Iu{b — = min{6/x, niino<c<6-i{/ylG(^ — 1 — c) + clx}}- Hence, Iq > I*{b — 1). 
Consider any fixed e > 0, and define /q = min{6/x, mino<c<6-i{-^yiG(& — 1 — c) — e + clx}}- Hence, 
lime_>.o/Q = Iq. Let g{n) be a function such that g{n) G uj{f{n)) and g{n) G o(n). We divide the proof 
into two parts. In Part 1, we show that there exists a finite A^i > such that for all n > A'^i, we have 

And in Part 2, we show that there exists a finite > such that for all n> N2, we have 

P2 < 4e-"-^« . 

Finally, combining both Parts, we have 

P(W(0) > 6) < e^'^") +4) e-"-^«, 
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for all n > = maxjA'^i, A^2}- By letting e tend to 0, and taking logarithm and limit as n goes to 
infinity, we obtain liminf„_s>oo log P (VF(0) > 6) > Iq, and thus the desired results. 
We next present a lemma that will be used in the proof. 

Lemma 6: Let A{-) be an arrival process. Consider the interval [ti,i2]> and let Rq be the empty space 
in the end-of-line frame at the end of time-slot ti — 1. Under the G-FBS policy, suppose that the number 
of unserved frames never becomes zero during interval [ti,t2], i.e., F{t) > for all r G [ti,t2], then 
the following holds. 



A{tiM)-RQ 



no 



< 



'A{tut2) 



no 



R{ti) = {Rq - A{ti,t2)) mod no. 

Remarks: The condition of the above lemma implies that every frame that arrives in interval [ti,t2] 
(except for the last frame) has exactly no packets. The proof follows an inductive argument for the proof 
of Lemma 8 in Q, and is thus omitted. 

Part 1: Consider any t G {1, 2, . . . , t*}. Let £t be the set of sample paths in which L{—h) = —t — b—1 
and £^ occurs. And let <sf be the set of sample paths in which + 1 — '^r=-t-b -^^iT) > 0. 

Let No be such that /(n) > 2{t* + b)^/n for all n > No. 

First, we want to show that for all n> Nq, we have 

£t C £^. (9) 

For every sample path in set £t, L{—b) = —t — b — 1 is the last time before —6—1, when the number 
of unserved frames is equal to zero. Then, the number of unserved frames never becomes empty during 
interval [—t -6,-6-1], i.e., 

F(r) >OforallrG [-t-6,-6-1]. (10) 



This implies that the condition of Lemma [6] holds, and hence, 

^A{-t-b,-b- 



Api-t-b.-b-l) 



I) 



no 



(11) 



Moreover, we have F{t) > or M (r) > for all r G [—6, —1]. Otherwise, there must exist one time-slot 
r' G [—6, —1] such that F(r') = and M(r') = 0, which impUes VF(0) < 6 and thus contradicts with 
(IHl). This, along with ([TOll, implies 



Xp{-t-b,-l) 



-1 

E 

-1 

E 

=-t-b 



^f{t )l{{F(T)>0}U{M(r)>0}} 



XF{r). 



(12) 
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Hence, for any sample path in set of Et, we have 



A(-t-b~b-l) 



no 



> Y,T=^t~b^p(.'^)- Let l{n) denote 
the number of packets in the C^^=_t_i) XF{T))-th frame that anives in the interval [—t — b, —b — 1]. We 
want to show that for any sample path in the set of £t, we have 

l{n) > f{n) - 2^. (13) 

Now, recall that for every sample path in set £t, there are two cases: either or £^'^ occurs. We 
consider these two cases separately. 

Case 1) Suppose £^^ occurs, i.e., Api-t-b, -6-1) > Xpi-t-b, -1). Then, from ([TT]) and ([HI), we 



have 



A{~t-b-b~l) 
no 



> J2T=~t-~b-^pi'^)- Lemma [6] implies that all the frames that arrive in the interval 
[—t — b,—b— 1], except the last frame, are fully filled (with no packets). Since the (X]r=-t-b XF{T))-th 
frame is not the last frame, we have l{n) = no = n — 2^/n > f{n) — 2y/n. 

Case 2) Suppose <S"^ occurs, i.e., AF{—t — b, —b — 1) = Xp{—t -6,-1) and [/(O) = 1. Then, from 
(HB and Cl]), we have T^^^^^^i^^^l = Er=-t-6 ^^(^)- This implies that the (Er=-t-6 ^^l^))-* 
frame is the last frame. Now, suppose that l{n) < f{n) — 2^/n. Then, the last success in the interval of 
\—t -6,-1] would have completely cleared the l(n) packets as they are among the oldest /(n) packets. 
This implies f7(0) = 0, which contradicts with the assumption that <5"^ occurs. Therefore, we must have 
/(n) > /(n) - 2^. 

Combining both cases, we show that (fT3l) holds for any sample path in set of £t. 
Recall that for any sample path in set of £t. Lemma |6] implies that all the frames that arrive in the 
interval of \—i — b,—b — 1], except the last frame, are fully filled (with no packets). Also, we know that 



A{-t-b-b-l) 
no 



> '^T-=~t~b-^FiT)- Then, we have 



A{-t-b,-b-l) 



>no{ Yl 



Xf{t) - I) + l{n) 



-t-b 



n{ XF{T)-l) + l{n)-2V^{ Y XFir)-!) 



-1 

>n{ J2 ^Fir) 

T=-~t~b 



for all n > Nq, where the last inequality is from ([T3] ). our choice of A^o^ and X]r=-t-6 "^^(^) 1^ t* -\-b. 



Hence, we have 



A{-t-b-b-l) 



> Y.T=~t~b^F{T), and thus 



A{-t-b-b-l) 



+ l-E;=-t_fe^F(T)>0. 



This implies £t C £'^ for all n> Nq. 
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Next, we calculate the probability that event 8^ occurs. Specifically, we want to show that there exists 
a finite A^i > such that for all n > Ni and for all t < t*, we, have 

Recall that Jag (t, 2;) = liminf„^oo ^ log P(A(-t + l, 0) > 7i{t + x)) and I ag{x) = initio lAG{t,x). 
Hence, for any fixed e > 0, there exists a finite such that for all n > N^, we have 



We next calculate an upper bound on the probability that during interval [—t — 6,-1], there are exactly 
t + a successes, for some a < b. Recall from Lemma [U that for all 7i > Nx, the probability of a success 
in each time-slot is no smaller than 1 — 2(1 — g)"~2/{n) jje^ce, we have 



\P{A{-t + 1, 0) > n{t + x)) < e 



n{lAG(t,x)-<^) 



(14) 



< e" 



n{lAG{x)-e) 



-1 



P( XF{T) = t + a) 



It is easy to observe that the right hand side is a monotonically increasing function in a. 




(15) 



We choose N4, > Q such that {t* + b + 1)2** \ T—] ^ e^^^"-* n > N4, and choose 
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iVi = max{No,Nx,N3, N^}. Using the results from CU) and we have that for all n > Ni, 



t J 

^,A(-t -6,-6-1) 

n ^-^ 

T=-t-b 

t+b -1 

= J]]P( ^ Xf{t) = a)P{A{-t-b,-b-l) > {a-l)n) 

a=0 T=-t-b 

<(t + 6 + l) max {P( V XF(T) = a) 

~ ~ T=-t-b 

X P{A{-t - 6, -6 - 1) > (a - l)n)} 

-1 

<(t + 6 + l)max{ max {P( V Xf{t) = a)}, 

a6{0,l,...,t} ^ 
-1 

max {P( V XF{T)=t + a) 
ae{i,...,b} ^ 

T=—t—b 

X P{A{-t - 6, -6 - 1) > (t + a - l)n)}} 

V (i + 6 + 1) max{2*+2'' f ^ j e""''^-, 

-1 

max {P( 5] Xi.(T)=t + a) 

X P(A(-t - 6, -6 - 1) > (t + a - l)n)}} 

V (t + b+l) max{2*+2^' / j e-fe^x ^ 

/ 1 \ 2fe/(n) 

max {2*+2M— ^) g-n(fe-a)7.g-n(/.o(a-l)-.)|| 



ae{l,...,b} \1 

< (t + 6 + l)2*+2b 



]^ \ 26/(n) 



X maxje"'^''''^-"^, e""™^'^"^*^ b}{lAG{a-l)-e+{b-a)Ix}^ 
(c) / 1 \ ^''/(n) 

< (f + 6+l)2**+2'' ' 



nmin{6/x,minee{o,i,...,6-i}{^AG(''-l-c)-e+c/x}} 



where (a) is from the monotonicity of the right hand side of (fTSl ). (b) is from (fT4l) and (fTSl ). (c) is from 
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changing variable by setting c = b — a, and (d) is from our choice of A^i. 

Summing over i = 1 to t*, we have 

f t' 



t=i 



t=i 



for all n>Ni. 

Part 2: We want to show that there exists a finite > such that for all n > N2, we have 

Recall that the arrival process satisfies the property ([S]). Let B = {bi, 62, ... , be the set of time- 
slots in the interval from — t — 6 to —6—1 when A{t) = Ln. Given L{—b) = —t — b—1, from Lemma |6l 
we have 

AFi-t-b,-b- 1) 



\B\-1 

s E 

r=l 



Aibr + l,br+l-l) 

no 

A{-t-b, bi -1) 



\B\ 



A(br,br 

no 



no 



+ 



r=l 

+ 1,-6-1) 



\B\-1 



< 



E 

r=l 



A{br+l — 1 — br)pn 



no 

\B\ 



Ln 



no 



— + \B 



r=l 



no 



{bi+t + b)pn (-6 - 1 - biB\)pn ^ 



< 



+ 

no no 
{t — \B\)pn T3\^^ 



no 



+ \B\ — + 2\B\ + 1 
no 



n 



< —{pt + {L + 2)\B\ + l). 
no 

From Assumption |3] on the arrival process we know that for large enough n and t, \B\ can be made 



less than an arbitrarily small fraction of t. Further, we can show that for n > 



,t > 



18(l+p) 
(2+p)(l-p) 



and \B\ < gfX+W*' we have Api—t -6,-6-1) < (^y^)t — 1. This is derived by substituting the values 



2+P^ 
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of n,t and \B\ in the equation above, 

AFi-t-b,-b- 1) 



Tl 

< —{pt + {L + 2)\B\ + l) 
no 

2 + pf l-p l-p l + 2p . 

< pt H 1 + 1 

1 + 2py 6 ^6 2 + p' 

2 + p /l + 2p l + 2p' 



l + 2p\ 3 2 + p 

<(^)*-i. 



Then, it follows that 



P{AFi-t - 6, -6 - 1) > - 1, L{-b) = -t-b-l) 

3 

= 1 - P(^i.(-t - 6, -6 - 1) < - 1, L{-b) = -t-b-l) 

for all n > iV5 ^ max{iVs(j5-p, ^i^), (^)n and t > Ti ^ max{TB(p-p, g^^), (^^g^}- 
We now state a lemma that will be used in the proof. 
Lemma 7: Let Xi be a sequence of binary random variables satisfying 

P{Xi = 0) < c(n)e-"'^, for all i, 

where c(n) is a polynomial in n of finite degree. Let N' be such that c(n) < e~ for all n > N' . Then, 
for any real number a G (0, 1), we have 

P Xi<{l- a)?j < 

for all n > max{i|,iV'}. 

Proof: The proof follows immediately from Lemma 1 of Q. ■ 
Moreover, we know from Lemma [U that for each r, Xp{t) = with probability less than 2(1 — 
q^n-2fin) ^ 2(^)2/(")e-"^^ for all n > Nx- Choose Nq such that 2{j^ff^"'') < for all 



December 10, 2012 



DRAFT 



34 



n 



> Nq. Hence, choosing Nj = ma.yi{Nx , Nq, (i„ j)/^ } and using Lemma |71 we have 

P(XF{-t -b,-l)< (^)t, L{-b) = -t-b-l) 

< p(Xp{-t - b, -1) < (^)(* + b),L{-b) = -t-b-l) 



< e 



-n(i+fc)(i-£)/x 



-nt 



(i-p)-fx 



< e" 

for all n > iVy and t > 0. 

From (O and (O, we have that for all n > iVs = max{iV5, A^y} and t>Ti, 

P{AF{-t -b,-b + Xpi-t - b, -1) > 0, L{-b) = -t - b - 1) 

< 1 - (1 - e"" 

where Ibx = min{ ^i^^f^ , Is (p - p, ^f^^)}. 
Now, we define 

t* = max i Ti , 



log 2 
Ibx 



Ibx 

Then, summing over all t > t*, we have that for all n > N2 = maxjA'^g, 

00 

p, = j2piL{-b) = -t-b-i,£n 

t=f 
00 

<^P{L{-b) = -t-b-l, 
t=f 

Api-t - 6, -6 - 1) + 1 > Xpi-t - b, -1)) 

00 



< 



-ntls 



< 



t=f 
2g—nt''lBx 



(a) 



-nt'lBx 



< 4e 

(ft) 

where (a) is from our choice of N2, and (b) is from ( fTSl ). 
Combining both parts, the result of the theorem then follows. 



(17) 



(18) 
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Appendix E 
Proof of Lemma [2] 

Consider two queueing systems, Qi and Q2, both of which have the same arrival and channel 
realizations. We assume that Q\ adopts the D-QSG policy and Q2 adopts the G-FBS policy. We define 
the weight of a packet -p in time-slot t as its delay, i.e., wij)) = t — tp, where tp denotes the time when 
packet p arrives to the system. Note that different packets (in the same queue or in different queues) may 
have the same delay. In order to make each packet in the system have a unique weight, we redefine the 
weight of a packet p as w{p) = t — tp + ^'^^i" + ' where qp denotes the index of the queue 

that contains packet p and Xp denotes that packet p is the Xp-th arrival to queue qp in time-slot tp. For 
two packets pi and p2, we say pi is older than p2 if u'(pi) > w{p2)- It must be noted that as in Q, we 
use weight w{-) instead of w{-) for ease of analysis only. 

Let Ri{t) denote the set of packets present in system Qi at the end of time-slot t, for i = 1, 2. Then, it 
suffices to show that Ri{t) C R2{t) for all time-slot t. We let A{t) denote the set of packets that arrive 
in time-slot t, and let A{t) denote the set of packets including the dummy packets added according to 
^ under G-FBS. For notational ease, we let Ai{t) and A2{t) denote A{t) and A{t), respectively. Then 
from we have Ai{t) C A2{t) for all t > 0. Let Xi{t) denote the set of packets that depart system 
Qi at time t, for i = 1, 2. Hence, we have 

Ri{t + 1) = {R,{t) U Ai{t + l))\X,{t + 1), for i = l,2. 

We then proceed our proof by contradiction. Suppose that Ri{t) ^ R2{t) for some time-slot t. Without 
loss of generality, we assume that time-slot r is the first time such that i?i(r) ^ R2{t) occurs. Hence, 
there must exist a packet, say p, such that p G Ri{t) but p ^ -R2 (''")■ Because r is the first time when 
such an event occurs, packet p must depart from system Q2 in time-slot r, i.e., p G ^2 ("?")■ 

Let Bi{v) denote the set of packets in i?i(r — 1) U Ai(r) with weight greater than v, for i = 1,2. 
Clearly, we have Bi{v) C B2{v) for all v, as i?i(r — 1) C R2{t — 1) by assumption and Ai{t) C A2{t). 
Let Si{v) denote the set of servers that are chosen to serve packets in Bi{v), for z = 1,2. Clearly, we 
have Si{v) C S2{v) for all v. This is true due to the following reason. Suppose that a packet Xj- G Bi{v) 
is served by some server S'j(r) in system Qi (under D-QSG), then one of the following must occur in 
system Q2 (under G-FBS, which runs D-QSG over the oldest n packets): either Xr is also served by 
server Sj^^^-y or server S'j (^.) is allocated to serve some packet with a larger weight than packet Xr due to 
the operations of D-QSG, and thus packet Xr must be served by some other server. 
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Now, suppose that packet p is not served in system Qi in time-slot r, from the operations of D- 
QSG, we know that packet p must be disconnected from any server in set S\Si{w{p)) in time-slot 
r. Hence, packet p must also be disconnected from any server in set S\S2{w{p)) in time-slot r, as 
Si{w{p)) C S2{w{p)). This implies that packet p cannot be served in system Q2, which leads to a 
contradiction and thus completes the proof. 

Appendix F 
Proof of Theorem H] 

We first present a sufficient condition on throughput optimality in Lemma [8j and then show that the 
sufficient condition is satisfied under D-QSG. 

Recall that Qi (t) denotes the length of queue Qi at the beginning of time-slot t immediately after packet 
arrivals, Zi^i{t) denotes the delay of the l-t\\ packet of Qi at the beginning of time-slot t, Wi{t) = 
denotes the delay of the HOL packet of Qi at the beginning of time-slot t, and Cij{t) denotes the 
connectivity between Qi and Sj in time-slot t. Also, recall that Sj{t) denotes the set of queues being 
connected to server Sj in time-slot t, i.e., Sj{t) = {I < i < n \ Cij{t) = 1}, and Tj(t) denotes the 
subset of queues in Sj{t) that have the largest weight in time-slot t, i.e., Tj(t) = {i € Sj{t) \ Wi{t) = 
max;g5^.(t) Wi{t)}. 

Lemma 8: Let i{j,t) be the index of the queue that is served by server Sj in time-slot t, under a 
scheduling policy P. Under Assumption[T] policy P is throughput-optimal if there exists a constant M > 
such that, in any time-slot t and for all j G {1, 2, . . . , n}, queue Qi(j^t) satisfies that > Zr^Aiit) 

for all r G Tj{t) such that Qrit) > M. 

Proof: Suppose that the sufficient condition in Lemma [8] is satisfied under policy P, i.e., there exists 
a constant M > such that in any time-slot t and for all j G {1,2, .. . ,n}, queue Qi(j^t) satisfies that 
> Zj.^K[{t) for all r G ^j{t) such that Qr{t) > M. We want to show that policy P can stabilize 
any arrival rate vector A strictly inside the optimal throughput region A*. 

Let Yij{t) denote the service of queue Qi received from server Sj in time-slot t, i.e., Yij{t) = Cij{t) 
if server Sj is allocated to serve queue Qi, and Yij{t) = otherwise. We define the random process 
describing the behavior of the underlying system as JV = {X{t),t = 0, 1, 2, . . . ), where 

X{t) ^ Zi^2{t), . . . , ^^,Q.(t)(t)), 1 < i < n; 

Ci,j{t), l<i<n,l<j <n}. 

The norm of X{t) is defined as = X]i<i<nQ«(^) ~^ Si<j<n^i(*)- Let Af(^) denote a process 
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X with an initial condition such that 

||A^(^)(0)|| =x. (19) 



The following Lemma was derived in 11161 for continuous-time countable Markov chains, and it follows 
from more general results in ifTTl for discrete-time countable Markov chains. 

Lemma 9: Suppose that there exist a real number e > and an integer T > such that for any 
sequence of processes {X^^\xT), x = 1, 2, . . . }, we have 

i||;f(^)(xr)|| 



lim sup E 



< 1 - e, (20) 



then the Markov chain X is stable. 

Lemma |9] implies the stability of the network, and a stability criteria of type ( |20l ) leads to a fluid Umit 
approach [ISj to the stability problem of queueing systems. 

In the following, we construct the fluid limit model of the system as in ifTSl . We assume that 
the packets present in the system in its initial state ^'(^^(0) arrived in some of the past time-slots 
— (x — 1),— (x — 2),...,0, according to their delays in state X{0). We define another process y = 
{A,Q,W,Y), i.e., a tuple that denotes a list of process, and clearly, a sample path of 3^^^'^ uniquely 
defines the sample path of X^^\ Then, we extend the definition of y to each continuous time t > as 
3;(^)(t) = 3^(^'^([tJ), where [t] denotes the integer part of t. 

Next, we consider a sequence of processes {-^y^^"^\xm:)} that are scaled in both time and space. 
Then, using the techniques of Theorem 4.1 of |[T5l or Lemma 1 of 111, we can show that for almost all 
sample paths and for any sequence of processes {^y^^"^\xm-)}, where {x^} is a sequence of positive 
integers with Xm — oo, there exists a subsequence {xm, } with x^, — >• oo as / — >• oo such that the 
following convergences hold uniformly over compact (u.o.c.) interval: 

— r^'' Af-^\r)dT ^ \t, (21) 



Xmi Jo 

1 r"'V/;-\r)dr^ fy^A^)dr, (22) 



^mi Jo Jo 

—Q'f"''\xrrj)^qi{t). (23) 

X-rrii 

Similarly, the following convergences (which are denoted by "=^") hold at every continuous point of the 
limiting function Wi{t): 

— Wt"''\xra,t)^Wi{t). (24) 
Xnii 

Any set of limiting functions {q, y, w) is called a. fluid limit. It is easy to show that the limiting functions 
are Lipschitz continuous in [0, oo), and are thus absolutely continuous. Therefore, these limiting functions 
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are differentiable at almost all (scaled) time t G [0, oo), which we call regular time. Moreover, the limiting 
functions satisfy that 

+ E ^i(O) = 1, (25) 



l<i<n l<i<n 

and that 

d { Ai-E,y*j(i), >o, 

-rAAt) = { (26) 

We then prove the stability of the fluid limit model using a standard Lyapunov technique. We consider a 
quadratic Lyapunov function in the fluid limit model of the system, and show that the Lyapunov function 
has a negative drift when its value is greater than 0, which implies that the fluid limit model is stable. 

Using a similar argument as in (HI, 191, we can show that under policy P, there exists a finite time 
Ti > such that for aU t > Ti, we have 

qi{t) = XiWiit) (27) 

for all i. This linear relation is similar to the Little's law and plays a key role in proving stability of the 
delay -based schemes. We omit the proof of this linear relation for brevity and refer readers to HI, Q. 
Let V{q{t)) denote the Lyapunov function defined as 

i=l ' 

Suppose that A is strictly inside A*, then there exists a vector fi ^ TZ such that Aj < Yl]=i for 
i. Let /3 denote the smallest difference between \i and Y^^=il^i,3' i-^-' — mini<j<„,(^"^]^ /Xij — Aj). 
Clearly, we have /3 > 0. It suffices to show that for any Ci > 0, there exist a ^2 > and a finite 
time r2 > such that for all regular time t > T2, V{q{t)) > d implies ^V{q{t)) < -C2, where 
^V{q{t)) = lim^^o . Choose any T2 > Ti. Since q{t) is differentiable for all regular 
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(29) 



time t > T2 such that V{q{t)) > 0, we can obtain the derivative of V{q{t)) as 

i=l * j=l 

n n 

i=i j=i 

n n n 

1=1 j=i j=i 

n n 

i=l j = l 

n / n n \ 

j=l Vi=i i=\ J 

where (a) is from (l26l ). and (b) is from (l27l) along with a little algebra. 

From dlTl ) and ( |28l ). we can choose (^3 > such that > Ci implies maxi<j<„ lUj(t) > ^3. 

Then, in the final result of ( [291) . we can conclude that the first term is bounded. That is, 

n n 

Y,Wi{t)■{\i-Y,^^,,j{t)) 
1=1 j=i 

n 

< 



Cs min CP,^ii,j{t) - \i) 

l<i<n ^ — ' 



<-C3/3 
= -C2 < 0. 

Therefore, we have that ^V{q{t)) < —Q2 if the second term in the final result of (l29l) is non-positive. 
We show this in the following. 

Considering the neighborhood around a fixed (scaled) time t > T2, we define N = { \xmit\ , [a^jn,*! + 
1, . . . , [xm, {t+Si)\ }, where 5 is a small positive number and {xmi} is a positive subsequence for which the 
convergence to the fluid limit holds. We will omit the superscript (xm, ) of the random variables (depending 
on the choice of the sequence {x^, }) throughout the rest of the proof for notational convenience (e.g., 

(x ) 

we use Qi{t) to denote {€)). We want to show that under policy P, in each time-slot t ^ N, each 

server Sj serves a connected queue Qi(j^T) having the largest weight in the fluid Umits, i.e., Wi(^j^T--j{t) = 
Lj{T) = maXjg5^(-T-) Wi{t) (recall that Sj{T) = {I < i < n \ Cij{T) = 1}). Note that the trivial statement 
holds if Sj{T) = or Lj^t) = 0. Hence, suppose that Sj{T) / and Lj{T) > 0. Consider r, s G 5j(r) 
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such that Ws{t) = Lj{T) and Wrir) = maxjg^^^^) Wj(r). In other words, Qg is a queue having the 
largest weight in the fluid Umit among aU the queues being connected to server Sj in time-slot r, and 
Qr is a queue having the largest weight in the original discrete-time system among all the queues being 
connected to server Sj in time-slot r. Note that it is possible that r = s. Then, for any time-slot r G A^, 
we have that 

(a) 



> 



(b) 

> Zr,M{T) - {[Xmi{t + S)\ - \Xmit]) 

> Zr,M (r) - Wr (r) + Wr (t) - ( [x^, {t + 6)\- \Xm, t] ) (30) 

> Zr,M{T) - Wr{T) + Ws{t) - ( [x^, (t + 6)\ - \XmA ) 
id) 

> Zr,M{T) - Wr{T) + Ws{[Xm,{t + 6)\) 

-2{lXm^{t + 6)\ - \x„,,t]), 
where (a) and (d) are due to the fact that the HOL delay cannot increase by more than [xm, {t+S)\ — \xmi t] 
within [xmi{t + S)\ — \xm,t\ time-slots, (b) is from the property of policy P satisfying the sufficient 
conditions, and (c) is due to Wr{T) = maxjg^ VFi(r) and s G 5j(r). Divide both sides of the final 
result of the above equation by x^i and let x^, goes to infinity, we have that 

U\ ('^) V ^i{j,T){Xm,t) 



I' .i,n ^^^ilW^i?^ + + (3') 

ws{t + 6) - 26, 

where (a) is from the definition of fluid limits, (b) is from (l30l ) and linij;^^ _5.oo L^"' — [fW^L _ 
and (c) is because lim^^^^oo ^"'"^L!"^"^^'' = 0' *e SLLN of O holds and Zr,A/(r) - Wrir) is the 
difference of the arriving times of two packets having finite number of packets in-between. Since the 
above equation holds for any arbitrarily small positive number 6, by letting 5 go to on both sides of 
the final result of the above equation, we have w^(^j^^-^{t) > Ws{t) = Lj{T), and in particular, we have 
'^i{j,T){t) = Lj{T). This is true for each j and for each t e N. Therefore, under policy P, the service 
vector y{t) satisfies that 

n n 

Wi{t)yij{t) = max'^Wi{t)i^ij, 

i=l '^^ i=l 
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for all j G {1,2, . . . ,n}. 
Thus, we have that 

y{t) G argmax ^ ^ Wi{t)vij, (32) 

which implies that 



5^5^w;i(tK,(t) <^^Wi{t)yi,j{t). (33) 

j=l i=l J = l i=l 

Therefore, this shows that V{q{t)) > (i implies ^V{q{t)) < — C2 for all t > T2. It immediately 
follows that for any C > 0, there exists a finite T > T2 > such that X]i<j<n'?«(^) — C- Further, we 
have that 

(g,(r) + ^.(T)) < (1 + -)C 

l<i<n 

due to the linear relation ( |27] ). 

Now, consider any fixed sequence of processes {X^^\x = 1,2,...} (for simplicity also denoted by 
{x}). From the convergences (l2T])- (l24l ). we have that for any subsequence {xm} of {x}, there exists a 
further (sub)subsequence {xm, } such that 

lim — ||Af(--)(x„,r)|| 

j^oo Xmi 



= {qi{T) + Wi{T)) < (1 + . ^ , )C 
l<i<n 

almost surely. This in turn implies (for small enough Q that 

lim -||A'(^)(xT)|| < (1 + — i -)C ^ 1 - e < 1 (34) 

a;^oo X mmi<j<„ \i 

almost surely. 

We can show that the sequence {i||^%'(^)(xT)||, x = 1,2,...} is uniformly integrable, due to the 
following: 

-\\X^-){xT)\\<l + - y r A,(T)dT + nT 

and 

E[l + - V / Ai{T)dT + nT]< 00, 

where the above finite expectation is from our assumption on the arrival process. Then, the almost surely 
convergence in (|34l ) along with uniform integrability implies the following convergence in the mean: 



limsupE[i||Af(^)(xr)||] < 1 - e. 
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Since the above convergence holds for any sequence of processes {X^^\xT),x = 1,2,...}, the 
condition of type (l20l) in Lemma |9] is satisfied. This completes the proof of Lemma [H ■ 

Then, it remains to show that the sufficient condition in Lemma [8] is satisfied under D-QSG. Let 
M = n. We want to show that in any time-slot t and for all j G {1, 2, . . . , n}, D-QSG allocates server 
Sj to serve queue Qi(j^t)^ which satisfies that VFj(j f)(t) > Zr^nit) for all r G ^j{t) such that Qr{t) > n. 

Consider any server Sj, and any r G ^j{t) such that Qr{t) > n. Then, queue Qr has at least one 
packet left in any of the n rounds before server Sj is allocated, since the other n — 1 servers can serve 
at most n — I packets of queue Qr- Hence, server Sj must be allocated in one of the n rounds. Say, 
server Sj is allocated in the A;-th round, and we rewrite Sj as <S'j(fc,t) to indicate that it is allocated 
in the k-th round. Suppose that server ^^(fci) is allocated to serve queue Qi(^k,t)^ then we have that 
^i{k,t)i't) ^ — W^~^{t) > Zr^nit), whcrc the second inequality is due to the operations of 

D-QSG, and the last inequality is because Qr{t) > n, and the HOL packet of queue Qr at the beginning 
of the k-th round must have a position no later than the n-th packet in queue Qr at the beginning of 
time-slot t. This implies that the sufficient condition is satisfied under D-QSG. Therefore, D-QSG is an 
MWF policy and is thus throughput-optimal. 

Appendix G 
Proof of Lemma [3] 

It suffices to prove that for any given system, i.e., for any given set of packets and for any channel 
realizations, both D-QSG and D-SSG picks the same schedule. Suppose that there are K packets in the 
system. Let denote the k-th oldest packet in the system. We want to show that packet x^ is either 
scheduled by the same server under both D-QSG and D-SSG, or is not scheduled by any server under 
both D-QSG and D-SSG. We denote the set of the k oldest packets by Vk — {xr \ r < k}, and denote 
the set of the first k servers by = {Sj \ j < k}. We prove it by induction method. 

Base case: Consider packet xi, i.e., the oldest packet, and consider two cases: under D-QSG, 1) packet 
xi is scheduled by a server, denoted by <S'j(i); 2) packet xi is not scheduled by any server. 

In Case 1), we want to show that packet xi is also served by the same server S'j(i) under D-SSG. Note 
that packet xi is the oldest packet in the system and is the first packet to be considered under D-QSG. 
Since it is served by 5*^(1), from the tie-breaking rule of D-QSG, we know that the queue that contains 
packet xi is disconnected from all the servers in set except server 5*^(1). Now, we consider the 
server allocation under D-SSG, which allocates servers one-by-one in an increasing order of the server 
index. Since all the servers in set 5^(1) except for server ^^(i) are disconnected from the queue containing 
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packet xi, these servers cannot be allocated to packet xi in the first j(l) — 1 rounds under D-SSG. While 
in the j(l)-th round, D-SSG must allocate server ^^(i) to packet xi, since the queue that contains packet 
xi is the queue that has the largest HOL among the queues that are connected to server 5*^(1). 

In Case 2), packet xi is the first packet to be considered under D-QSG, but is not served by any server. 
This implies that no servers are connected to the queue that contains packet xi. Hence, packet xi cannot 
be served under D-SSG either. 

Combining the above two cases, we prove the base case. 

Induction step: Consider an integer k £ {1,2, . . . , K — 1}. Suppose that every packet in set Vk is 
either scheduled by the same server under both D-QSG and D-SSG, or is not scheduled by any server 
under both D-QSG and D-SSG. We want to show that this also holds for every packet in set Vk+i- 
Clearly, it suffices to consider only packet x^+i (i.e., the {k + l)-th oldest packet in the system), as the 
other packets all satisfy the condition from the induction hypothesis. We next consider two cases: under 
D-QSG, 1) packet x^+i is scheduled by a server under D-QSG; 2) x^+i is not scheduled by any server. 

In Case 1), suppose that packet x^+i is scheduled by server ^^(jt+i) under D-QSG. We want to show 
that packet Xk+i is also scheduled by server 5j(fc_^i) under D-SSG. We first show that under D-SSG, 
packet cannot be scheduled in the first j{k + 1) — 1 rounds. Note that under D-QSG, packet x^+i is 
scheduled by server Sj(^k^iy This implies that any server in set ^^(fc+i)-! is either disconnected from the 
queue that contains packet Xk+i or has already been allocated to packets in set Vk under D-QSG. This, 
along with the induction hypothesis, further implies that under D-SSG, in the first j{k + l) — 1 rounds, the 
servers under consideration are either disconnected from the queue that contains packet x^+i or allocated 
to packets in set Vk- Hence, packet Xk+i cannot be scheduled in the first j{k + 1) — 1 rounds under 
D-SSG. Next, we want to show that packet xj^^i must be scheduled by server 5*^(^+1) in the j{k + l)-th 
round under D-SSG. Let V'j^ C Vk denote the set of packets among the k oldest packets that are not 
scheduled under both D-QSG and D-SSG. Then, all the packets in set "P^ must be disconnected from 
server 5^(^+1), otherwise some packet Xr € V'f^ should be scheduled by server Sj(^k+i) under D-QSG. 
On the other hand, the induction hypothesis implies that any packet Xr G 'Pk\P'k must be scheduled by 
some server, denoted by »S'j(r)> under D-SSG, where j(r) ^ j{k + 1). Hence, D-SSG does not allocate 
server Sj^^^+i) to packet in set Vk- Therefore, in the {k + l)-th round, D-SSG must allocate server 
Sj(^k+i) to packet x^+i, since the queue that contains packet Xk+i has the largest HOL delay among the 
queues that are connected to server Sj(^k+i)- 

In Case 2), packet Xk+i is not scheduled by any server under D-QSG. This implies that the queue that 
contains packet Xk+i is disconnected from all the servers in set Sn\{Sj(^r) I ^ ^/fc\^fc}' i-^-' the set 
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of available servers when considering packet Xk+i- On the other hand, the induction hypothesis implies 
that under D-SSG, all the servers in set {Sj(^r) I r G VkXP'j^} are also allocated to packets in set Vk\V'i^. 
Hence, packet x^+i cannot be scheduled by any server under D-SSG either. 

Combining the above two cases, we prove the induction step. This completes the proof. 
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